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Activities Completed During This Reporting Period 

During Year Two of the project Systems Interoperability and Collaborative Development for 
Web Archiving, all main activities to be completed were successfully accomplished. However, a 
one-year, no-cost extension was requested and approved to enhance project deliverables through 
additional community building and knowledge sharing activities supported by leftover grant 
funds originally earmarked for attendee travel to the project’s National Symposium. These 
additional activities, and the extension, will significantly improve some of the project’s 
published outputs and white papers. It will also allow for additional stakeholder meetings to 
review advancements of the project’s API-based system integrations and improve the project’s 
research outcomes. Accordingly, the publication of the research papers planned for the end of 
Year Two of the (originally two-year) project, have been pushed into Year Three. These 
documents were created in draft form in Year Two but will be published in the extended Year 
Three after further research and meetings are conducted to better infonn their composition. 

At a high level, Year Two work had the following outcomes. 

• A two-day “National Symposium on Web Archiving” with over 40 of the largest and/or 
most technically accomplished web archiving programs in the U.S. in attendance. 

• 11 conference/community presentations, forums, or discussion groups delivered 
including at IIPC, CNI, SAA, LDCX, JCDL, DLF, and others. 

• 10 research dataset requests using the project’s API and dataset production pipeline for 
delivering research datasets generated from web archives. 

• 4 institutional users involved in testing the project’s data transfer APIs. 

• 3 hands-on workshops with the library, archive, and research community promoting the 
project’s APIs for use in computational research, including iPRES, WARCshop, 
Archives Unleashed, and others. 

• 3 project demonstration videos and sets of technical and user documentation. 

• 3 draft papers circulated to the community for input and 4 blog posts written. 

• 2 integrations of project APIs into production preservation workflows at project partners. 

• 1 community survey conducted. 

• 1 Technical Working Group meeting convened. 

• 1 online training webinar for SAA’s Web Archiving Section. 



Interim Report - Year Two, Internet Archive, LG-71-15-0174 


The project continued operating under the working name of the “WASAPI” project (Web 
Archiving Systems APIs), for clear identification, community building, and to establish and 
proselytize a foundational network of contributing and participating institutions that can continue 
to work together beyond the grant period. 

The second year of WASAPI focused on three areas of work within web archiving. 

1. Network development though a mixture of symposia, workshops, and meetings to 
operationalize the project’s technical models and developments, as well as the production 
rollout and expansion of a range of APIs, utilities, research tools, and supporting 
materials such as technical documentation, demonstration videos, and tested use cases. 

2. Formalization of the project’s research and findings through a number of forthcoming 
publications outlining API-based system interoperability through both production-grade 
and R&D-level APIs, beta and released third-party systems integrations, and second-level 
services building on the project’s work and tools. 

3. Summarizing community knowledge and requirements sharing around the broader social 
effort to foster collaborative technology development, and modeling successful social, 
governance, and shared social infrastructure for catalyzing joint engineering efforts 
around web archiving technologies. 

All Year Two goals were achieved, though, as noted above, the release of some publications was 
delayed to be informed by additional meetings utilizing previously unspent grant funds. It was 
decided to focus the remaining funding on a no-cost extension to support additional stakeholder 
meetings to better inform research and findings originally expected to be published in Year Two, 
but now slated for a Year Three release. This will enable additional external contributors to the 
project, improved documentation of project deliverables, and additional time for community 
building and identifying future work beyond the grant period, by the WASAPI community. 

In Year Two, activities were organized around the grant’s three main areas of research and 
development work: 

1) What are the attributes of a community model that can support sustainable and broad-based 
collaborative web archiving technology development? 

The main activity of Year Two for this area of work was the hosting of the National Symposium 
on Web Archiving Interoperability at the Internet Archive on February 21-22, 2017. This event 
was attended by over 40 organizations reflecting the largest and most active web archiving 
programs in the U.S. (with some additional Canadian participants). The attendees represented a 
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broad spectrum of organizational types including government agencies, museums, university 
libraries and archives, graduate programs, archiving services, cultural organizations, and research 
institutions. The agenda featured a range of presentation on the current state of APIs in web 
archiving and digital library programs, including current uses of existing APIs for connecting 
search, indexing, and discovery services, as well as the state of API development in the broader 
digital library community. The invited presentations provided provocation for a breakout- and 
discussion-centric meeting agenda aimed at fostering conversation, networking, socialization, 
and partnership exploration - all key activities to seed a grassroots effort at preliminary 
community building for advancing technical work across web archiving programs. 

Over the course of breakout discussions, presentations, and side meetings, a number of themes 
and challenges emerged. Some of the most common issues that surfaced were: 

• Web archiving lacks a centralized convening national meeting. The international focus 
and comparatively high membership cost of the IIPC makes it practically inaccessible as 
a common forum for coordination and networking among a national web archiving 
community. National web archiving meetings have periodically taken place on the 
impetus of individual institutions or projects, or as sub-tracks of larger meetings, but not 
on as recurring events. How can such meetings be regularized, scaled, and sustained? 

• Beyond a few key players, institutional commitment to web archiving as a core collection 
and preservation activity lags behind all other digital library initiatives. Technical 
collaboration is inhibited by the fact that only two or three institutions dedicate one or 
more FTE developers on scalable open-source web archiving development. In a field of 
such fractional or negligible institutional commitment, what models best support the 
minimal availability of engineers to participate in distributed development projects? 

• Those few institutions with engineering capacity often have interests or commitments 
(service providing, student research support) that does not always overlap with 
academic-oriented library collecting programs. How can affiliate interest be prioritized 
for a cross-pollination of broad community need and sparse engineering availability? 

• The technical complexity and scale challenges of web archiving introduce two challenges 
that are sometimes in conflict: the need for infrastructure/architecture composed of many 
APIs for more modularity, and the co-dependency of many of these APIs. For example, 
some attendees noted that the use of data transfer APIs likely necessitate the 
simultaneous use of descriptive metadata APIs, even though these functions exist in 
separate parts of the lifecycle and are maintained in separate systems. How will follow-on 
API development after the WASAPI grant address these dependencies? Should it be 
through more systems integration and/or R&D around data modeling and standards? 

• Successful community-building will require a range of social and technical tools, from 
meetings and communication channels to use cases and project demos. The challenge of 
the fonner is the existing glut of library/archives conferences but lack of events focused 
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on web archiving. The challenge of the latter is that the fractional staffing and resources 
for web archiving limit the adoption necessary to document and sustain collaboration, 
documentation, and published results that can be promoted. 

Further reflections, meeting details, and outcomes will be published in a forthcoming “National 
Symposium” report that will be published in Year Three and includes an Executive Summary by 
the grant principals, as well as reflective essays authored by a range of event attendees. Year 
Two of the project also featured a continuation of the significant conference presentations and 
project promotion by grant principals and staff at the project’s institutions. These talks and 
events promoted the project’s work, solicited the involvement of the broader digital library 
community in API-based interoperability, and built community awareness. Year Two sought to 
expand on Year One’s community building by extending outreach efforts to include additional 
researcher communities and promote the downstream uses, data mining, and computational 
research enabled by the project’s work on derivative dataset delivery infrastructure via the 
WASAPI APIs. Project staff delivered 11 presentations at a variety of conferences, promoting 
the project across some key constituencies: web archiving practitioners, national and 
international web archiving programs (i.e., those with technical resources), researchers and 
research support services and librarians, and data-driven and digital library efforts. 

Events included in the grant team’s work: 

• National Symposium on Web Archiving Interoperability (February 2017) 

• Archives Unleashed (February 2017) 

• LDCX (March 2017) 

• Collections as Data Conference (March 2017) 

• Web Archiving Section of the Society of American Archivists (March 2017) 

• ARLIS-NA (April 2017) 

• WARCshop at Penn State (April 2017) 

• IIPC Web Archiving Conference (June 2017) - multiple sessions 

• Presentations at Bibliotheque nationale de France & Koninklijke Bibliotheek (July 2017) 

• Archive-It Annual Partner Meeting (July 2017) 

• Archives & Records Association (UK - August 2017) 

• Digital POWRR Institute (November 2017) 

• Dodging the Memory Hole: Saving Online News (November 2017) 

• CNI Fall 2017 Meeting (December 2017) 

Many of the aforementioned events featured an in-depth focus on the WASAPI work and notable 
feedback from the community. At the IIPC Web Archiving Conference, the session included a 
presentation by grant principals, demos of the APIs, and an open session for discussion on the 
potential for improved interoperability across the web archiving ecosystem. The WASAPI work 
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helped fuel the creation and agenda of a “Tools Portfolio” group within the IIPC to organize 
technical work and a number of initiatives aimed at bringing the WAS API project’s findings to 
an international community. It is already evident that the 2018 IIPC Web Archiving Conference 
will have a dedicated focus on APIs in many sessions, and we foresee the event providing 
amplification of our project’s outputs in a way that has benefits both nationally and 
internationally. Other events helped promote how the WASAPI APIs can facilitate easier 
research use of web archives by showing the outputs of API-based research tools, including 
online news analysis and the tools and hackathons organized by the Archives Unleashed project. 

For dissemination and reporting, the WASAPI team continued to use a number of channels and 
portals for documenting the project’s work and for encouraging the involvement of the larger 
community. The project’s main working space, including links to publications, but especially 
code and technical documentation, is in GitHub, 

https://github.com/WASAPI-Communitv/data-transfer-apis . the project’s document and media 
repository is in the Internet Archive, https://archive.org/details/wasapi . Project communication 
takes place via a Slack team f hUps://wasapi.slack. com/i . which has over 50 members. Additional 
outreach and dissemination tools include a Google Group and communication via multiple 
listservs (IIPC, SAA lists, Archive-It). Overall, Year Two of the WASAPI project was successful 
in forming and launching a community model focused on building API-based interoperable 
systems for web archiving and included the participation of curatorial, technical, research, and 
the larger digital library and archives community in its work. The symposium and affiliated 
events provided much greater evidence that can be included in the project’s research reports and 
recommendations for scaling a national community invested in technical issues in web archiving. 
Continued conference events, stakeholder meetings, and outreach in Year Three will build upon 
this week and set the stage for the project’s post-grant growth and sustainability. 

2) What are the community needs and possibilities for the planned open API to facilitate 
transfer of web archive data between distributed systems and what other prospective 
APIs does it point to? 

The work of Year One in producing two project-specific surveys assessing the state of 
preservation data transfer in the web archiving community helped infonn use cases around 
community need. There were two primary areas of work across the grant partners in Year Two 
that advanced this strategic project goal. The first was social, primarily consisting of the 
convening of the National Symposium, which both provided a forum for community members to 
detail their local uses of existing web archiving APIs and a chance to articulate and document the 
many use cases for which the WASAPI project and future API work could improve existing 
workflows and new processes and functionalities. In addition, focused conversations at many of 
the outreach and promotion events, and inclusion of continued information-seeking questions in 
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annual survey mechanisms by NDSA and Archive-It, helped advance this work. The second was 
technical, consisting primarily of engineering, testing, and iterative improvement of the project’s 
suite of APIs and utilities for web archive data transfer to local repositories and for downstream 
use. 

The social work elucidated a number of community needs, including: 

• Need for more robust tools with better metadata for downloading web archive data from 
service providers to local preservation repositories. 

• Need for better access to externally-stored WARC files for use in creating custom local 
discovery systems via local indexing. 

• Need for APIs to facilitate multi-institutional aggregated collections with federated 
search and index building (and similarly, the need to enable easier inter-institutional 
transfer of web archive data to avoid duplicative archiving activities, but enable 
distributive custodialism). 

• Need for APIs upon which to build second-level services, such as quality assurance, 
capture improvement, format migrations (or preservation actions in general), ingest 
management, data analytics, indexing, et cetera. 

• Need for transfer and local processing of WARC files for enabling research and data 
mining use cases, such as powering Jupyter Notebooks or data extraction. 

Discussion on prospective future WAS API APIs beyond this grant fell into two categories: 

• Metadata APIs: Though “metadata” is a vague term in web archiving (where technical 
and administrative metadata tend to endlessly proliferate), the most common inference of 
the term in the context of these conversations was for “descriptive” and “seed” metadata. 
This meant, broadly, user-applied descriptive metadata such as subject headings, 
collection descriptions, or other human classification metadata and seed metadata 
meaning, mostly, settings applied to specific seeds or URLs being archived, such as 
frequency of capture, scoping rules, and other acquisition management parameters. 

• Crawl Configuration APIs: These prospective APIs included information on more 
granular aspects of crawling configuration, such as robots.txt handling, seed type 
information (e.g. archiving one URL, one URL plus embedded content, one URL plus all 
URLs linked therefrom, et cetera), and capture limits or expansions (such as exclusions 
of files over a specific size, host-level rules, automated behavior or other utilities used by 
the crawler, et cetera). 

Metadata APIs already exist, or are in development by some services, primarily Archive-It and 
LOCKSS, but mostly for powering web applications and not for the explicit use case of enabling 
external user integrations (though these are often possible second-level uses). Efforts towards 
metadata APIs is certain to be a follow-on activity to this specific grant and is already on the 
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engineering roadmap of Archive-It. Crawl configuration APIs also exist, but mostly to integrate 
bespoke and “hidden” functions in the technical pipeline of web archiving, such as WARC 
writing, browser rendering, or crawling cluster management and queue distribution - activities 
largely of import only to archiving technology management organizations (i.e. services) and not 
yet of interest to individual archiving institutions, the majority of which are contracting out these 
activities. Standardization of crawl configuration does have the potential to facilitate distributed, 
multi-technology, web archiving approaches that could improve the scale and quality of some 
web archiving activities, so could have wide community benefit, even if the work is organized 
among a few service providers, tool developers, and technical partners. 

The technical work involved the building, testing, and iterative improvement of the APIs 
developed as part of this project. This elucidated a number of unforeseen issues and possibilities: 

• Temporal complexity: Any web archiving program is running numerous crawls at various 
frequencies across many collections. While a start-date and end-date can be associated 
with a specific crawl job, and all WARCs have an associated timestamp at which the file 
was created, exploration of use cases quickly exposes an extensive set of complexities 
associated with web archiving that can prove challenging to building data models 
supporting APIs. A crawl job may not include all the seeds in a collection, so crawl job 
and collection are not 1 -to-1 metadata points. While a crawl has a start- and end-date, 
because of how WARC files are written, and because the WARC timestamp represents 
the time the file was “opened” for archival infonnation to be written to it, a WARC can 
have data in it that was “written” to the WARC after the crawl job period “ended” (i.e. 
the crawler stopped finding new URLs to archive, but still had a queue of URLs to write 
to a file). So this “lag” period can make any time-based query parameters somewhat 
slippery. Though perhaps a niche case due to the multiple technologies and formats 
involved in the web archiving process, it is one that makes developing finite API 
parameters a challenge. To address this, the Archive-It API included the ability to query 
WARCs by separate data ranges associated with the crawl job and with the WARC 
creation date itself, i.e. “give me all the WARCs associated with all the crawls within this 
time range” or “give me all the WARCs from this time range even if part of a crawl job 
time range falls outside of it.” 

• Packaging and fonnats: WARC files (the ISO standard for web archives) are rather 
unique in that they conventionally represent the crawling process, not a specific 
information package like a document or an image. A WARC can contain millions of tiny 
files or just one big file. Essentially, the crawling “opens” a new file, writes files to it as it 
crawls and downloads or streams them, then “closes” the file when it reaches a certain 
size, generally 1GB. This means that one archived website may be spread across many 
WARC files and that one WARC file may contain many websites. Data transfer APIs 
need to be created based on WARC files and associated metadata. Supporting transfer 
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APIs based specifically on data potentially consisting of only parts of WARC files 
requires extracting that data from existing WARCs and writing that data to new WARC 
files. This is too complex and computationally expensive for most use cases or 
institutions beyond those like the Internet Archive with high-performance computing 
clusters. The other method for this sort of transfer is by directly downloading the 
individual resources through an access interface (basically scraping a web archive), 
which is already possible, but also introduces technical challenges and provenance 
confusion. The Archive-It API addressed this issue by introducing a data point in the 
APIs response that indicates if the set of results from a query “includes-extra” - meaning, 
essentially, this list of WARCs includes the data requested by an API request, but also 
includes other data within these WARC that does not confonn to a given query, such as a 
specific seed (whose totality of data may only fill a portion of a WARC file). 

• Crawl granularity: As suggested above, associating a specific “crawl job” with a set of 
seed URLs, configurations, timestamps, and other technical metadata points can be 
complex. In the case of Archive-It’s build of the data transfer API, some historical crawls 
were not associated with a “crawl job identifier,” a data point introduced later in the 
service, and since Archive-It (and other web archives) often contain historical web data 
crawled via different services and transferred, later, to the repository, this data will also 
lack a crawl job identifier, since crawl-IDs are internally detennined. As a consequence, 
API query parameters based on crawl/job IDs may potentially only be applicable to some 
portions of a total collection. Therefore, web archive data transfer APIs need to enable 
query parameters that account for the incompleteness of other query parameters. 

• Results delivery: Collections may contain tens or hundreds of thousands of WARC files. 
Delivering extensive metadata about each individual file in such a collection introduces 
performance challenges, given the volume of data returned for a query. Thus an API 
needs to support unexpected functionality such as pagination. In addition, to support the 
research use cases outlined above, the API needed to also return results related to 
derivative files, such as CDX, WAT, or WANE files. But since these files are not 
preservation fonnats (since they can always be regenerated from WARCs, they need not 
be preserved long-tenn) their existence in results may be ephemeral, since these results 
contain li nk s to on-disk locations which, for derivative files, may only be temporary. 

While some of the points outlined here may delve into technical minutiae, they illustrate the 
broader challenges of pairing user expectations around features and functionality and the 
technical contingencies of building APIs based on tools and processes that are often obscure, 
complex, or unknown to archive managers or downstream users. Illuminating and exploring 
some of these challenges was a specific mandate of this research and development project, and 
many of these issues only emerged through technical development, not through requirements 


8 



Interim Report - Year Two, Internet Archive, LG-71-15-0174 


gathering or scenario planning, so better understanding of these issues can be considered a 
successful outcome of the technical work of the project. 

3) How can better interoperability of web archiving systems support new forms of access 
and research? 

Though funded as in the “Research” project category, the grant proposal included significant 
engineering and development work intended both for R&D purposes, but also to build 
sustainable, at-scale, in-production systems. Given the size and complexity of the web archiving 
endeavour, technical work that cannot scale to production-level implementation, while 
informative, is of minimal value to expanding our collective archive of historically-valuable 
materials published on, and archived from, the web. Thus, the project’s scope of work features 
the requirements development, planning, build out, testing, iterative improvement, and 
production release of a number of data transfer APIs. All these APIs are open-source and are 
documented and can be used, forked, and improved by the broader community. 

The Archive-It data transfer API is fully documented in the WAS API project’s Github repository 
('https://github.com/WASAPI-Communitv/data-transfer-apis T This API is already in use by 
dozens of Archive-It partners who actively use it to transfer their data collected using Archive-It 
into their own local preservation repositories. The API is fully documented and has already led 
many partners, even beyond the grant, to build local tools to interact with the API. The build-out 
of this API included features beyond the scope of the grant work, including the ability to submit 
a job to request the creation of research datasets related to a collection and the ability for the API 
to notify a requestor when a job is completed and then provide results via the API to the 
in formation related to those research datasets (whose data model generally conforms to those of 
the corresponding preservation file) — essentially enabling a research-datasets-as-a-service 
feature as part of the transfer API. 

The Mellon-funded LOCKSS software re-architecture will result in a LOCKSS system re-built 
around API-based web service components and an underlying WARC data store (regardless of 
the original format of the source content). The WASAPI data transfer APIs provide a common 
and predictable interface for the export of content out of a LOCKSS system. With the foundation 
of a common format and APIs, consumers of data from LOCKSS systems can capitalize on the 
growing array of tools, workflows, and expertise around WARC-stored content and WARC 
derivative formats. The University of North Texas has also built an open-source local utility and 
tested it with production transfers of web data between institutions. Similarly, the Stanford 
University SUL continues to invest in automation to streamline the flow of web archives created 
using Archive-It into local preservation, access, and discovery systems. The WASAPI data 
transfer APIs allow for easier and more robust integration of this hybrid and distributed 
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infrastructure, freeing up developer time to focus on where they can add particular value to their 
web archive collections, through enhanced and integrated local access and discovery services. 
Rutgers University has built a tool explicitly for researchers that uses the WASAPI API to find 
files, request datasets, download those datasets, run local data mining and data analysis scripts, 
and generate a data visualization output (this workflow is outlined in this presentation slide, 
https://docs.google.eom/presentation/d/llAjeNmnnJb_lLYofqR-ZlqcqxKZ_ith057vCPWdPFt4/ 

edit#slide=id.g2b9d777c4c_0_86) . This tool, like all the others in this project, is also 
open-source but also has powered existing research efforts and enabled data visualizations 
featured in scholarly publications and presentations using computational analysis of web 
archives to support scholarly work. 

Lastly, an unexpected, but exciting, occurrence over the course of the project’s Year Two work, 
was the start of development of second-level services by non-grant-partner institutions using the 
project’s production APIs. The Archives Unleashed project ( http://archivesunleashed. 0 rg/l has 
built a researcher workbench platfonn that utilizes the WASAPI API for transferring data from 
Archive-It collections to their infrastructure for local processing to support research use of 
collections. As well, grant team members have had calls with a joint project of the 
COPPUL/OCUL academic library consortium in Canada that is working to provide second-level 
preservation services to its members built around transfer of the Archive-It and LOCKSS 
collections into a shared repository system. Other projects, such as Webrecorder and Islandora 
have tested the WASAPI APIs for exploring system and service integrations. Supporting such 
community-wide efforts towards interoperability across tools and services was, of course, one of 
the main goals of this grant; however that such work began even during the grant period itself, 
instead of afterwards, was both a pleasant surprise but also a direct affirmation of the grant 
original statement of need and argumentation. Supporting, extending, and scaling this ecosystem 
of interoperability across web archiving systems will continue in Year Three and beyond. 

Changes 

The key change was the approval of a one-year, no-cost extension of the grant. This extension 
allowed for some published deliverables to be schedule for a Year Three release instead of the 
original Year Two release. Otherwise, there were no major changes in key personnel, budget 
allocation, scope, or schedule during Year Two. The only notable events in this regard were the 
expansion of scope to include the work of overlapping projects within the grant partners’ own 
institutions (no funding is needed or was requested for this addition). This includes documenting 
affiliate APIs by Internet Archive that complement the WASAPI work and help establish 
WASAPI as the hub of API and web archiving systems interoperability collaboration. 

Finding or Accomplishments During This Reporting Period 
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Findings and accomplishments are examined in detail in the reporting sections above, but can be 
categorized at a high-level to align with the project’s areas of work. 


Community Models for Collaborative Web Archiving Technology Development 
The National Symposium demonstrated the need for a dedicated, national-level web archiving 
annual meeting exists in the community. Current technical work is fractured across institutions 
and minimally documented or promoted. The features of the social and development architecture 
to best support cooperative development were detailed. Local use cases may vary, and the suite 
of web archiving tools benefiting from collaborative development is large, but there is 
community willingness to invest in the core technologies and combine them with other digital 
library tools to achieve workflow and resource efficiencies and advance the quality and scale of 
our collective efforts to collect and preserve historical records published on the web. 

Needs and Requirements for Web Archive Data Transfer and Other APIs 
Development, iteration, and production deployment of numerous transfer APIs, local utilities, 
and research data mining tools provided ample insight into the complexities and possibilities for 
further interoperability between web archiving and digital library systems, as well as how these 
can inform research use cases. Though available people resources for development on web 
archiving tools remain extremely limited, benefits of scale are possible when well organized 
through structured community building and collaboration. Modularity in architecture, efficiency 
in performance, flexibility in data modeling, and other characteristics will prove essential in 
planning future engineering efforts support WASAPI’s goals, as proven by project work. 

Potential for new Web Archiving APIs to Support New Technologies and Uses 
As noted above, grant successes include unforseen system integrations already emerging during 
the grant’s research and development phase, including preservation services, data mining 
applications, harvesting tools, access interfaces, and more. This points to a significant potential 
for connecting a variety of systems that can support ah aspects of the web archiving lifecycle, 
not just those targeted in this project. This potential “systems interoperability landscape” is better 
understood, operationally and technically, and is maturing, as a result of the grant’s work. 
Additionally, data mining tools in beta-stage development as part of the project are already 
supporting the work of researchers in their analytical work and appearing in presentations and 
scholarly publications. The grant’s success so far has enabled both these trends and has initiated 
a community-wide effort to sustain the WAS API project’s continued accomplishment in 
advancing web archiving and furthering libraries’ ability to preserve and provide access to our 
nation’s historical record in born-digital form. 

Key Project Resources 
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https://github.com/WASAPI-Communitv/data-transfer-apis 

https://archive.org/details/wasapi 
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